Iterative Part-of-Speech Tagging
نویسندگان
چکیده
Assigning a category to a given word (tagging) depends on the particular word and on the categories (tags) of neighboring words. A theory that is able to assign tags to a given text can naturally be viewed as a recursive logic program. This article describes how iterative induction, a technique that has been proven powerful in the synthesis of recursive logic programs, has been applied to the task of part-of-speech tagging. The main strategy consists of inducing a succession T1, T2, ..., Tn of theories, using in the induction of theory Ti all the previously induced theories. Each theory in the sequence may have lexical rules, context rules and hybrid ones. This iterative strategy is, to a large extent, independent of the inductive algorithm underneath. Here we consider one particular relational learning algorithm, CSC(RC), and we induce first order theories from positive examples and background knowledge that are able to successfully tag a relatively large corpus in Portuguese.
منابع مشابه
سیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی
Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...
متن کاملAn improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملUnsupervised Part of Speech Tagging for Persian
In this paper we present a rather novel unsupervised method for part of speech (below POS) disambiguation which has been applied to Persian. This method known as Iterative Improved Feedback (IIF) Model, which is a heuristic one, uses only a raw corpus of Persian as well as all possible tags for every word in that corpus as input. During the process of tagging, the algorithm passes through sever...
متن کاملProbabilistic tagging of minority language data: a case study using Qtag
While probabilistic methods of part-of-speech tag assignment have long received consideration in corpus and computational-linguistic research, less attention would appear to have been paid to date to the development of tagging accuracy over rounds of iterative, interactive training in applications of these methods. Understanding this aspect of probabilistic tagging is arguably of particular imp...
متن کاملHandling Sparse Data by Successive Abstraction
A general, practical method for handling sparse data that avoids held-out data and iterative reestimation is derived from first principles. It has been tested on a part-of-speech tagging task and outperformed (deleted) interpolation with context-independent weights, even when the latter used a globally optimal parameter setting determined a posteriori.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999